Statistical language model based on a hierarchical approach: MCnv

نویسندگان

  • Imed Zitouni
  • Kamel Smaïli
  • Jean Paul Haton
چکیده

In this paper, we propose a new language model based on dependent word sequences organized in a multi-level hierarchy. We call this model MC n, where n is the maximum number of words in a sequence and is the maximum number of levels. The originality of this model is its capacity to take into account dependent variable-length sequences for very large vocabularies. In order to discover the variable-length sequences and to build the hierarchy, we use a set of 233 syntactic classes extracted from the 8 French elementary grammatical classes. The MC n model learns hierarchical word patterns and uses them to reevaluate and filter the n-best utterance hypotheses outputted by our speech recognizer MAUD. The model has been trained on a corpus of 43 million words extracted from a French newspaper and uses a vocabulary of 20000 words. Tests have been conducted on 300 sentences. Results achieved 17% decrease in perplexity compared to an interpolated class trigram model. Rescoring the original n-best hypotheses resulted in an improvement of 5% in accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Models of EFL Learners’ Vocabulary Development: Spreading Activation vs. Hierarchical Network Model

Semantic network approaches view organization or representation of internal lexicon in the form of either spreading or hierarchical system identified, respectively, as Spreading Activation Model (SAM) and Hi- erarchical Network Model (HNM). However, the validity of either model is amongst the intact issues in the literature which can be studied through basing the instruction compatible wi...

متن کامل

Intelligent identification of vehicle’s dynamics based on local model network

This paper proposes an intelligent approach for dynamic identification of the vehicles. The proposed approach is based on the data-driven identification and uses a high-performance local model network (LMN) for estimation of the vehicle’s longitudinal velocity, lateral acceleration and yaw rate. The proposed LMN requires no pre-defined standard vehicle model and uses measurement data to identif...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

A Model of Iranian EFL Learners\' Cultural Identity: A Structural Equation Modeling Approach

This study aimed, firstly, to investigate the underlying components of Iranian cultural identity and, secondly, to confirm the aforementioned components via Structural Equation Modeling (SEM) analysis. In order to achieve these goals, the researchers reviewed the extensive local and international literature on language, culture and identity. Based on the literature and consultations with a grou...

متن کامل

Multi-Criteria Risk-Benefit Analysis of Health Care Management

Abstract Purpose of this paper: The objectives of this paper are two folds: (1) utilizing hierarchical fuzzy technique for order preference by similarity to ideal solution (TOPSIS) approach to evaluate the most suitable RFID-based systems decision, and (2) to highlight key risks and benefits of radio frequency identification technology in healthcare industry. Design/methodology/approach: R...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001